Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logs and extend metrics for graphql_validation_mode: both #3674

Merged
merged 13 commits into from
Sep 8, 2023

Conversation

goto-bus-stop
Copy link
Member

@goto-bus-stop goto-bus-stop commented Aug 25, 2023

This adds logging for query validation errors with either Rust or JS when there is a mismatch, i.e. one of them validates but the other does not. In other cases we are not really interested in the specific error (it will just go back to the user), so we don't need to log there.

To log the Rust validation error well, I now store the ApolloDiagnostics that were produced on Query{}. Query is serializable for caching, but ApolloDiagnostic is not. Here I just skipped serializing ApolloDiagnostic so if Query is loaded from cache, it does not have the validation error stored. I'm not sure this is the right thing to do. The ApolloDiagnostics are later used after query planning (which may produce a JS validation error). So it's correct if we can ~safely assume that we only have valid Query instances cached. Otherwise we might get spurious error logs from this.

  • So is that a safe assumption? Reading the CachingQueryPlanner implementation I think it does only store errors (then it's not a Query instance) and fully successful planning (then it has run both Rust and JS validation already). So it looks fine, but it could be a bit brittle to rely on this.

I also simplified the validation error printing which

Closes #3681

@github-actions
Copy link
Contributor

@goto-bus-stop, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

@router-perf
Copy link

router-perf bot commented Aug 25, 2023

CI performance tests

  • step - Basic stress test that steps up the number of users over time
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • const - Basic stress test that runs with a constant number of users
  • reload - Reload test over a long period of time at a constant rate of users
  • large-request - Stress test with a 1 MB request payload
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • no-graphos - Basic stress test, no GraphOS.

@Geal Geal requested review from a team, garypen, Geal, bnjjj and SimonSapin and removed request for garypen August 30, 2023 14:21
@abernix abernix removed the request for review from bnjjj September 1, 2023 08:23

self.errors.iter().for_each(|err| {
// Outputs a pretty colourised report on TTYs
eprintln!("{err}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really gate this entire function to test scope. We should never call println or variants in prod code. Otherwise if this is something that should be used then it should use tracing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I've removed this function as it is only used in one place (supergraph schema validation error) and that already bubbles up the error. The bubbling up lost the actual underlying error message as described in #3681, so I've changed that in 525f376.

@abernix abernix requested a review from BrynCooke September 8, 2023 08:25
@goto-bus-stop goto-bus-stop changed the title Log query validation errors when graphql_validation_mode: both Add logs and extend metrics for graphql_validation_mode: both Sep 8, 2023
BrynCooke
BrynCooke approved these changes Sep 8, 2023
@goto-bus-stop goto-bus-stop merged commit b6164b3 into dev Sep 8, 2023
@goto-bus-stop goto-bus-stop deleted the renee/log-query-validation-errors branch September 8, 2023 13:03
garypen pushed a commit that referenced this pull request Sep 12, 2023
This adds logging for query validation errors with either Rust or JS
when there is a mismatch, i.e. one of them validates but the other does
not. In other cases we are not really interested in the specific error
(it will just go back to the user), so we don't need to log there.

To log the Rust validation error well, I now store the ApolloDiagnostics
that were produced on `Query{}`. `Query` is serializable for caching,
but ApolloDiagnostic is not. Here I just skipped serializing
`ApolloDiagnostic` so if `Query` is loaded from cache, it does not have
the validation error stored. I'm not sure this is the right thing to do.
The ApolloDiagnostics are later used after query planning (which may
produce a JS validation error). So it's correct if we can ~safely assume
that we only have valid Query instances cached. Otherwise we might get
spurious error logs from this.
- [ ] So is that a safe assumption? Reading the CachingQueryPlanner
implementation I think it does only store errors (then it's not a
`Query` instance) and fully successful planning (then it has run both
Rust and JS validation already). So it looks fine, but it could be a bit
brittle to rely on this.

I also simplified the validation error printing which
- [x] depends on apollographql/apollo-rs#630.
- [x] and on #3675

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to parse schema does not output any information on where the parse failure occured
3 participants